/********************* Notes for Census staff *******************************
All these inputs can be modified below, where we define globals for each input.

Indexing of input datasets index (modify in line 30): 		currently set to 1, 2, 3, 4
Input dataset name (modify in line 46 after "$datadir/"): 	ssb_v6_0_2_synthetic`i'.dta (modify before `i'-- this is the index)
Location of input dataset (modify in line 33): 			/rdcprojects/co/co00517/SSB/data/current
Location of code, output (modify in line 34): 			"/rdcprojects/co/co00517/SSB/programs/users/spec821/Single Implicate Log and Output"

We use personid, spouse_personid variables. For easy modification, we generate 
   alternate variables "person_id" and "spouse_person_id" to be changed as desired
   (this should be modified below in line 52/53).

The only output of this code (besides log files) is the Excel file "sumstats_SSDI_SSB`i'",
   where i indexes over the GS files. All tabulations in this table are taken at a "group" level,
   where each of 4 groups has over 300 observations in this implicate. We report only median, 
   mean, and standard deviation.
   
This file is the only one that should be run.
   
****************************************************************************/

/*This do-file accomplishes the following two tasks
1. extracts the relevant sample under study (adapted from code by Tim Moore for the SSA)
2. calculates summary statistics on the sample (adapted from code by Matt Unrath on the public use SIPP)*/

set more off
clear

*generate a list of the 4 Gold Standard Completed Files
loc gs_list 1 2 3 4 // modify to how GS files are indexed

*set directories
global datadir /rdcprojects/co/co00517/SSB/data/current // location of input dataset
global codedir "/rdcprojects/co/co00517/SSB/programs/users/spec821/Single Implicate Log and Output" // location of code, output
cd "$codedir"

*This do-file runs the ssb_ssdi.do file on the various files. 

*run on each of the files

*run on full sample

global impflag //intentionally blank

foreach i in `gs_list' {
*we want to use dataset from $datadir, then run ssb_ssdi.do

*load dataset into memory
use "$datadir/ssb_v6_0_2_synthetic`i'.dta", clear

/* Note: we use personid, spouse_personid variables. For easy modification,
   we generate alternate variables "person_id" and "spouse_person_id" to be changed
   as desired.*/
   
qui gen person_id = personid
qui gen spouse_person_id = spouse_personid

*define implicate number as global so it can be passed through to the summary stats table
global gsnum `i'

*run single implicate code
do ssb_ssdi.do
}

*run on non-imputed sapmle

foreach i in `gs_list' {
*we want to use dataset from $datadir, then run ssb_ssdi.do

*load dataset into memory
use "$datadir/ssb_v6_0_2_synthetic`i'.dta", clear

drop if flag_valid_ssn==0

/* Note: we use personid, spouse_personid variables. For easy modification,
   we generate alternate variables "person_id" and "spouse_person_id" to be changed
   as desired.*/
   
qui gen person_id = personid
qui gen spouse_person_id = spouse_personid

*define implicate number as global so it can be passed through to the summary stats table
global gsnum `i'
global impflag _nimp

*run single implicate code
do 5_SIPP_synthetic_create.do
}
